Hyper-Decision Transformer

Abstract

Decision Transformers (DT) have demonstrated strong performances in offline reinforcement learning settings, but quickly adapting to unseen novel tasks remains challenging. To address this challenge, we propose a new framework, called Hyper-Decision Transformer (HDT), that can generalize to novel tasks from a handful of demonstrations in a data- and parameter-efficient manner. To achieve such a goal, we propose to augment the base DT with an adaptation module, whose parameters are initialized by a hyper-network. When encountering unseen tasks, the hyper-network takes a handful of demonstrations as inputs and initializes the adaptation module accordingly. This demonstration-conditioned initialization enables HDT to efficiently adapt to novel tasks by only finetuning the adaptation module, which is a small fraction of the DT module's parameters. We validate HDT's generalization capability on object manipulation tasks and demonstrate the above two advantages. 1) With one single demonstration containing expert actions, HDT achieves comparable converged performances with finetuning the whole DT model but requires much fewer computations finetuning 0.5% parameters of the DT model and demonstrates faster convergence. 2) HDT can even successfully solve unseen tasks with a small number of environment rollouts when the demonstration only contains observed states and rewards, outperforming state-of-the-art baselines in task success rates by a large margin.

Hyper-Decision Transformer

To facilitate the adaptation of large-scale transformer agents, we propose Hyper-Decision Transformer (HDT), a Transformer-based architecture maintaining data and parameter efficiency during online adaptation from observations. HDT consists of three key modules: a base DT model encoding shared knowledge across multiple tasks, a hyper-network representing a task-specific meta-learner, and an adaptation module updated to solve downstream unseen tasks. Similar to Decision Transformer, HDT takes recent contexts as input and outputs fine-grind actions. To encode task-specific information, HDT injects adapter layers into each decoder block. The adapter layer's parameters come from a stand-alone hyper-network that takes both demonstrations without actions and the decoder's layer id.

Demos for Ablation Studies

We hope to investigate whether the trained hyper-network encodes meaningful task-specific details with environment rollouts. We compare our proposed HDT, which initializes adapter layers with trained hyper-networks, with HDT-Rand, which initializes adapter layers randomly. Note that we only show the rollouts in testing tasks without any fine-tuning. Our visualization in five testing tasks show that the adapter initialized with hyper-network helps provide a strong policy prior.

Bin-picking task

HDT Initializaiton

HDT-Rand Initialization

Box-close task

HDT Initializaiton

HDT-Rand Initialization

Door-lock task

HDT Initializaiton

HDT-Rand Initialization

Door-unlock task

HDT Initializaiton

HDT-Rand Initialization

Hand-insert task

HDT Initializaiton

HDT-Rand Initialization